ncing reads hitting a gene of a replicate, i.e., the sequencing count
plicate for one gene. Table 6.4 shows such a count matrix used to
how genes contribute to the airway smooth muscle cytokine
[Himes, et al., 2014], where there were two experimental
s and each condition had four replicates. The objective of
g such a sequencing count matrix was to find out which subset of
differentially expressed across two conditions.
A count matrix after the sequencing reads have been mapped to a reference
ach count represents the times the sequencing reads hit a gene. Gene IDs were
by removing prefix ENSG00000000 and sample IDs were shortened by
refix SRR10395. This means the full ID of gene 003 was ENSG00000000003
ID of sample 08 was SRR1039508.
08
09
12
13
16
17
20
21
723
486
904
445
1170
1097
806
604
0
0
0
0
0
0
0
0
467
523
616
371
582
781
417
509
347
258
364
237
318
447
330
324
96
81
73
66
118
94
102
74
0
0
1
0
2
0
0
0
equencing count data are different from the microarray data as
a are non-negative integers. Therefore, limma may not be very
The negative binomial distribution has been employed for
ng DEGs for the sequencing count data [Robinson, et al., 2009;
nd Huber, 2010].
cover DEGs for sequencing count data using DESeq2
is a package developed for gene differential expression pattern
y based on a sequencing count data set [Love, et al., 2014]. The
g to do is to generate a design matrix. Table 6.5 shows a design
r the data shown in Table 6.4. In this matrix, the column labelled
as for the names of replicate. The column labelled by dex was
fying two experimental conditions, namely control and treated
y comparing the column id and the column dex, all samples in